AITopics

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsMar-22-2026, 19:08:46 GMT

HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation

The task of layout-to-image generation involves synthesizing images based on the captions of objects and their spatial positions. Existing methods still struggle in complex layout generation, where common bad cases include object missing, inconsistent lighting, conflicting view angles, etc. To effectively address these issues, we propose a \textbf{Hi}erarchical \textbf{Co}ntrollable (HiCo) diffusion model for layout-to-image generation, featuring object seperable conditioning branch structure. Our key insight is to achieve spatial disentanglement through hierarchical modeling of layouts. We use a multi branch structure to represent hierarchy and aggregate them in fusion module. To evaluate the performance of multi-objective controllable layout generation in natural scenes, we introduce the HiCo-7K benchmark, derived from the GRIT-20M dataset and manually cleaned.

artificial intelligence, machine learning, proceedings, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Neural Information Processing SystemsFeb-18-2026, 01:20:20 GMT

DeBaRA: Denoising-Based3DRoomArrangement Generation

Weargue that the most critical component of a scene synthesis system is to accurately establish thesizeandpositionofvarious objects within arestricted area.

Neural Information Processing SystemsFeb-15-2026, 17:21:07 GMT

LayoutPrompter: Awaken the Design Ability of Large Language Models

In this work, we propose Layout-Prompter, which leverages large language models (LLMs) to address the above problems through in-context learning.

div class, large language model, machine learning, (17 more...)

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
North America > United States (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Neural Information Processing SystemsFeb-8-2026, 02:36:45 GMT

Visual Programming for Text-to-Image Generation and Evaluation

Figure 1: Illustration of the proposed visual programming frameworks for text-to-image (T2I) generation and evaluation.

large language model, machine learning, natural language, (18 more...)

Country: Europe > Croatia > Zagreb County > Zagreb (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)

Neural Information Processing SystemsDec-23-2025, 23:45:51 GMT

Visual Programming for Step-by-Step Text-to-Image Generation and Evaluation

As large language models have demonstrated impressive performance in many domains, recent works have adopted language models (LMs) as controllers of visual modules for vision-and-language tasks. While existing work focuses on equipping LMs with visual understanding, we propose two novel interpretable/explainable visual programming frameworks for text-to-image (T2I) generation and evaluation. First, we introduce VPGen, an interpretable step-by-step T2I generation framework that decomposes T2I generation into three steps: object/count generation, layout generation, and image generation. We employ an LM to handle the first two steps (object/count generation and layout generation), by finetuning it on text-layout pairs. Our step-by-step T2I generation framework provides stronger spatial control than end-to-end models, the dominant approach for this task.

step-by-step text-to-image generation, step-by-step text-to-image generation and evaluation, visual programming, (8 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.81)

de Guevara, Manuel Ladron, Rhee, Jinmo, Bidgoli, Ardavan, Razgaitis, Vaidas, Bergin, Michael

Tokenizing Buildings: A Transformer for Layout Synthesis

arXiv.org Artificial IntelligenceDec-5-2025

W e introduce Small Building Model (SBM), a Transformer-based architecture for layout synthesis in Building Information Modeling (BIM) scenes. W e address the question of how to tokenize buildings by unifying heterogeneous feature sets of architectural elements into sequences while preserving compositional structure. Such feature sets are represented as a sparse attribute-feature matrix that captures room properties. W e then design a unified embedding module that learns joint representations of categorical and possibly correlated continuous feature groups. Lastly, we train a single Transformer backbone in two modes: an encoder-only pathway that yields high-fidelity room embeddings, and an encoder-decoder pipeline for autoregressive prediction of room entities--referred to as Data-Driven Entity Prediction (DDEP). Experiments across retrieval and generative layout synthesis show that SBM learns compact room embeddings that reliably cluster by type and topology, enabling strong semantic retrieval. In DDEP mode, SBM produces functionally sound layouts--with fewer collisions and boundary violations and improved navigability. 1

large language model, machine learning, natural language, (17 more...)

2512.04832

Country: North America > Canada (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-2-2025

M3DLayout: A Multi-Source Dataset of 3D Indoor Layouts and Structured Descriptions for 3D Generation

Zhang, Yiheng, Cai, Zhuojiang, Wang, Mingdao, Guo, Meitong, Li, Tianxiao, Lin, Li, Wang, Yuwang

In text-driven 3D scene generation, object layout serves as a crucial intermediate representation that bridges high-level language instructions with detailed geometric output. It not only provides a structural blueprint for ensuring physical plausibility but also supports semantic controllability and interactive editing. However, the learning capabilities of current 3D indoor layout generation models are constrained by the limited scale, diversity, and annotation quality of existing datasets. To address this, we introduce M3DLayout, a large-scale, multi-source dataset for 3D indoor layout generation. M3DLayout comprises 21,367 layouts and over 433k object instances, integrating three distinct sources: real-world scans, professional CAD designs, and procedurally generated scenes. Each layout is paired with detailed structured text describing global scene summaries, relational placements of large furniture, and fine-grained arrangements of smaller items. This diverse and richly annotated resource enables models to learn complex spatial and semantic patterns across a wide variety of indoor environments. To assess the potential of M3DLayout, we establish a benchmark using both a text-conditioned diffusion model and a text-conditioned autoregressive model. Experimental results demonstrate that our dataset provides a solid foundation for training layout generation models. Its multi-source composition enhances diversity, notably through the Inf3DLayout subset which provides rich small-object information, enabling the generation of more complex and detailed scenes. We hope that M3DLayout can serve as a valuable resource for advancing research in text-driven 3D scene synthesis. All dataset and code will be made public upon acceptance.

machine learning, natural language, object-oriented architecture, (19 more...)

2509.23728

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.67)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

arXiv.org Artificial IntelligenceNov-19-2025

APD-Agents: A Large Language Model-Driven Multi-Agents Collaborative Framework for Automated Page Design

Chen, Xinpeng, Han, Xiaofeng, Zhang, Kaihao, Ren, Guochao, Wang, Yujie, Cao, Wenhao, Zhou, Yang, Lu, Jianfeng, Song, Zhenbo

Layout design is a crucial step in developing mobile app pages. However, crafting satisfactory designs is time-intensive for designers: they need to consider which controls and content to present on the page, and then repeatedly adjust their size, position, and style for better aesthetics and structure. Although many design software can now help to perform these repetitive tasks, extensive training is needed to use them effectively. Moreover, collaborative design across app pages demands extra time to align standards and ensure consistent styling. In this work, we propose APD-agents, a large language model (LLM) driven multi-agent framework for automated page design in mobile applications. Our framework contains OrchestratorAgent, SemanticParserAgent, PrimaryLayoutAgent, TemplateRetrievalAgent, and RecursiveComponentAgent. Upon receiving the user's description of the page, the OrchestratorAgent can dynamically can direct other agents to accomplish users' design task. To be specific, the SemanticParserAgent is responsible for converting users' descriptions of page content into structured data. The PrimaryLayoutAgent can generate an initial coarse-grained layout of this page. The TemplateRetrievalAgent can fetch semantically relevant few-shot examples and enhance the quality of layout generation. Besides, a RecursiveComponentAgent can be used to decide how to recursively generate all the fine-grained sub-elements it contains for each element in the layout. Our work fully leverages the automatic collaboration capabilities of large-model-driven multi-agent systems. Experimental results on the RICO dataset show that our APD-agents achieve state-of-the-art performance.

artificial intelligence, machine learning, natural language, (17 more...)

2511.14101

Country: Asia > China (0.46)

Genre:

Research Report (0.82)
Workflow (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceNov-3-2025

Sketch-to-Layout: Sketch-Guided Multimodal Layout Generation

Brioschi, Riccardo, Alekseev, Aleksandr, Nevali, Emanuele, Döner, Berkay, Malki, Omar El, Mitrevski, Blagoj, Kieliger, Leandro, Collier, Mark, Maksai, Andrii, Berent, Jesse, Musat, Claudiu, Kokiopoulou, Efi

Graphic layout generation is a growing research area focusing on generating aesthetically pleasing layouts ranging from poster designs to documents. While recent research has explored ways to incorporate user constraints to guide the layout generation, these constraints often require complex specifications which reduce usability. W e introduce an innovative approach exploiting user-provided sketches as intuitive constraints and we demonstrate empirically the effectiveness of this new guidance method, establishing the sketch-to-layout problem as a promising research direction, which is currently under-explored. T o tackle the sketch-to-layout problem, we propose a mul-timodal transformer-based solution using the sketch and the content assets as inputs to produce high quality layouts. Since collecting sketch training data from human annotators to train our model is very costly, we introduce a novel and efficient method to synthetically generate training sketches at scale. W e train and evaluate our model on three publicly available datasets: PubLayNet [43], DocLayNet [32] and SlidesVQA [35], demonstrating that it outperforms state-of-the-art constraint-based methods, while offering a more intuitive design experience. In order to facilitate future sketch-to-layout research, we release O(200k) synthetically-generated sketches for the public datasets above.

large language model, machine learning, natural language, (20 more...)

2510.27632

Country: Asia (0.67)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)